Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

نویسندگان

Eugene A. Feinberg

Jefferson Huang

Bruno Scherrer

چکیده

This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial. © 2014 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The value iteration algorithm is not strongly polynomial for discounted dynamic programming

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iter...

متن کامل

Recent Progress on the Complexity of Solving Markov Decision Processes

The complexity of algorithms for solving Markov Decision Processes (MDPs) with finite state and action spaces has seen renewed interest in recent years. New strongly polynomial bounds have been obtained for some classical algorithms, while others have been shown to have worst case exponential complexity. In addition, new strongly polynomial algorithms have been developed. We survey these result...

متن کامل

A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes

This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23) has proposed a modified policy iteration algorithm with a suboptimality test of MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of...

متن کامل

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based o...

متن کامل

General Dynamic Programming Algorithmsapplied to Polling

We formulate the problem of scheduling a single server in a multi-class queue-ing system as a Markov decision process under the discounted cost and the average cost criteria. We develop a new implementation of the modiied policy iteration (MPI) dynamic programming algorithm to eeciently solve problems with large state spaces and small action spaces. This implementation has an enhanced policy ev...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Oper. Res. Lett.

دوره 42 شماره

صفحات -

تاریخ انتشار 2014

Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

نویسندگان

چکیده

منابع مشابه

The value iteration algorithm is not strongly polynomial for discounted dynamic programming

Recent Progress on the Complexity of Solving Markov Decision Processes

A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

General Dynamic Programming Algorithmsapplied to Polling

عنوان ژورنال:

اشتراک گذاری